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Abstract 

We identify principles characterizing Solomonoff Induction by demands on 
an agent's external behaviour. Key concepts are rationality, computability, 
indifference and time consistency. Furthermore, we discuss extensions to the 
full AI case to derive AIXI. 
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1 Introduction 



Ray Solomonoff |Sol60] introduced a universal sequence prediction method that in 
|Sol96t IHut07[ IRHllj is argued to solve the general induction problem. |Hut05j 
extended Solomonoff induction to the full AI (general reinforcement learning) set- 
ting where an agent is taking a sequence of actions that may affect the unknown 
environment to achieve as large amount of reward as possible. The resulting agent 
was named AIXI. Here we take a closer look at what principles underlie Solomonoff 
induction and the AIXI agent. We are going to derive Solomonoff induction from 
four general principles and discuss how AIXI follows from extended versions of the 
same. 

Our setting consists of a reference universal Turing machine (UTM), a binary 
sequence (produced by an environment program (not revealed) on the reference 
machine) fed incrementaly to the agent and a loss function (or reward structure). We 
give the agent in question the task of choosing a program for the reference machine 
so as to minimize the loss. The loss is in general defined to be a function from a pair 
of programs, an environment program and an agent program, to real numbers. The 
loss function can be such that it is only the prediction (for a certain number of bits) 
produced by the program that matters or it can care about exactly which program 
was presented. A loss function of the latter kind leads to the agent performing the 
task of prediction, which is what Solomonoff induction is primarily concerned with 
while the latter can be viewed as identifying an explanatory hypothesis, which is 
more closely related to the minimum message length principle |WB68t IWDQQj IWalOS] 
or the minimum description length principle |Ris78t IGriiOTt IRislOj . Solomonoff 
induction is using a mixture of hypothesis to achieve the best possible prediction. 
Note that the fact that we pick one program does not rule out that the choice 
is internally based on a mixture. In the case when the loss only cares about the 
prediction, the program is only a representation of that prediction and not really a 
hypothesis. 

The principles are designed to avoid stating what the internal workings of the 
agent should be and instead derive those as a consequence of the demands on the 
behaviour. Thus we demand rationality instead of stating explicitly that the agent 
should have probabilistic beliefs and we demand time consistency instead of ex- 
plicitly stating probabilistic conditioning. The computability principle is avoiding 
saying that the agent should have a hypothesis class that consists of all computable 
environments by instead demanding that it deliver a computation procedure (a 
program for our reference machine) that produces its prediction for the next few 
bits. The indifference principle states what the initial preferences of the agent must 
be, i.e. a demand for how the initial decision should be taken. The choice is based 
on symmetry with respect to a chosen representation scheme for sequences, e.g. pro- 
grams on a reference machine. In other words we do not allow the agent to be biased 
in a certain sense that depends on our reference machine. Informally we state the 
principles as follows: 
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1. Computability: If we are going to guess the future of a sequence, we should 
choose a computation procedure (a program for the reference machine) that 
produces the predicted bits 

2. Rationality: We should choose our predicted sequence such that the depen- 
dence on the priorities (formalized by a reward (or loss) structure) is consistent. 

3. Indifference: The initial choice between programs only depends on their 
length and the priorities (again formalized by reward (or loss)) 

4. Time Consistency: The choice of program docs not change by a new ob- 
servation if the program's output is consistent with the oberservation and the 
reward structure is still the same and concerned with the same bits 

Our reasoning leading from external behavioural principles to a completely de- 
fined internal procedure can be summarized as follows; The rationality principle tells 
us that we need to have probabihstic behefs over some set of alternatives; The com- 
putability principle tells us what the alternatives are, namely programs; The indiffer- 
ence principle leads to a choice of the original beliefs; The time-consistency principle 
leads to a simple procedure for updating the beliefs that the second principle tells 
us must exist, namely conditioning. In total it leads to Solomonoff Induction. 

We can not remove any of the principles without losing the complete specification 
of a procedure. The first property is part of the set up of what we ask the agent 
to do. Without the second we lose the restriction that we take decisions based on 
maximum expected utility with respect to probabilistic behefs and one could then 
have an agent that always chose the same program (e.g. a very short one). Without 
the third principle we could have any apriori beliefs and without the fourth the 
agent could after a while change its mind regarding what beliefs it started with. 

1.1 Setup 

We are considering a setting where we give an agent a task that is defined by a 
reference machine (a UTM), a reward structure (or loss function if we negate) and a 
binary sequence that is presented one bit at a time. The binary sequence is generated 
by a program for the reference machine. 

The agent must (as stated by the first principle) chose a program (whose output 
must be consistent with anything that we have seen in case we have made obser- 
vations) for the reference machine and then use its output (which can be of finite 
or infinite length) as a prediction. If we want to predict at least h bits we have to 
restrict ourself to machines that output at least h bits. We will consider an enumer- 
ation of all programs Tj. We are also going to consider a class of reward structures 
Ri^j. The meaning is that if we guess that the sequence is (as the output of) Tj 
and the actual sequence is T,-, then we receive reward i?j j. Note that for any finite 
string there are always Turing machines that computes it. We will furthermore sup- 
pose that Vi, Ri^j ^ as J ^ oo. This means that we consider it to be a harder 
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and harder task to guess Tj as j gets really large. This assumption is not strictly 
necessary as we will discuss later. 

1.2 Outline 

Section |2] provides background on Solomonoff induction and AIXI. In Section [3] 
we deal with the first two principles mentioned above about rationality and com- 
putability. In Section HJ we discuss the third principle which defines a prior from 
a (Universal Turing Machine) representation. Section O describes the sequence pre- 
diction algorithm that results from adding the fourth principle to what has been 
achieved in the previous sections. Section [6] extends our analysis to the case where 
an agent takes a sequence of actions that may affect its environment. Section [7] con- 
cerns equivalence between our beliefs over deterministic environments and beliefs 
over a much larger class of stochastic environments. 

2 Background 

2.1 Sequence Prediction 

We consider both finite and infinite sequences from a finite alphabet X. We denote 
the finite strings by X* and we use the notation xi-t := Xi,a;2, ■■■■,Xt for the first t 
elements in a sequence x. A function p : X* — )• [0, 1] is a probability measure if 

p{x) =^p{xa)'ix e X* (1) 

and p(e) = 1 where e is the empty string. Such a function describes a priori proba- 
bilistic beliefs about the sequence. If the equality in ([1]) is instead > and p(e) < 1 
then we have a semi- measure. We define the probability of seeing the string a after 
seeing x as being p{a\x) := p{xa)/p{x). If we have a loss function L : X x X ^ M., 
we ( |Hut07] ) choose, after seeing the string x, to predict 

argmin L(a, b)p{b\x). (2) 

More generally, if we have an alphabet y of actions we can take and a loss function 
L : y X X ^ we make the choice 

argmin L(a, b)p{b\x). (3) 

2.2 The Solomonoff Prior 

Ray Solomonoff |Sol60] defined a set of priors that only differ by a multiplicative 
constant. We call them Solomonoff priors. To define them we need to first introduce 
some notions about Turing machines |Tur36j . 
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A monotone Turing machine T (which we will just call Turing machine and 
whose exact technical definition can be found in |LV08] ) is a function from a set of 
(binary) strings to binary sequences that can either be finite or infinite. We demand 
that it be possible to describe the function as a machine with unidirectional input 
and output tapes, read/write heads, a bidirectional work tape and a finite state 
machine that decides the next action of the machine given the symbols under the 
head on the input and work tape. The input tape is read only and the output tape 
is write only. We write that T{p) = x* if output of T starts with x when given input 
{program) p. 

A universal Turing machine is a Turing machine that can emulate all other 
Turing machines in the sense that for every Turing machine T there is at least one 
prefix p, such that when px is fed to the universal Turing machine, it computes the 
same output as T would when fed x (See |LV08t IHutOSj for further details). 

A sequence is called computable if some Turing machine outputs it, or in other 
words, if for every universal Turing machine there is a program p that leads to this 
sequence being the output. 

We can also define what we will call a computable environment from a Turing 
machine. A computable environment is something which you (an agent) feed an 
action to and the environment outputs a string which we call a perception. We can 
for example have a finite number of possible actions and we put one after another on 
the input tape of the machine. We wait until the previous input has been processed 
and one of finitely many outputs has been produced. The machine might halt after 
a finite number of actions have been processed or it might run for ever. 

Definition 1 (Semi-measure from Turing machine). Given a Turing machine T, we 
let 

Ar(x) := Yl 2"'^^^ (4) 

p:T{p)=x* 

where l{p) is the length of the program (input) p and T[p) = x* means that T starts 
with outputting x when fed p, though it might continue and output more afterwards. 

If the Turing machine T in Definition [T] is universal we call Ay a Solomonoff 
distribution. Solomonoff induction is defined by letting p in Section 12.11 be the 
Solomonoff prior for some universal Turing machine. If f/ is a universal Turing 
machine and T is any Turing machine there exists a constant c > (namely 2~'*^'?) 
where q is the prefix that encodes T in U) such that 

\u{x) > cAt(x) Vx G X*. (5) 

The set {\t \T Turing} can be identified with |LV08] with all lower semi-computable 
semi-measures (see |LV08j for definitions and proofs). The property expressed by ([5]) 
is called universality (or dominance) and is the key to proving the strong convergence 
results of Solomonoff Induction [S^78l iLVOSl IHutOSllHiltOT] . 



5 



2.3 AIXI 



In the active case where an agent is taking a sequence of actions to achieve some 
sort of objective, we are trying to determine the best policy vr, defined as a function 
from a history aiQi, ...,atqt of actions at and perceptions qt to a choice of the next 
action at+i- The function p from the sequence prediction case is in the active case 
of the form p(gi, ...,qt\ai, ...,at) and represent the probabihty of seing qi, ...,qt given 
that we have chosen actions ai,...,at. We can again define a "learning" algorithm 
by conditioning on what we have seen to define 

/ I N p{qi, ■■■,qt+k\0'i, ■■■,at+k) 

p{qt+i, qt+k\qi, qt, ai, at+k> ■= 7 , (6) 

p{qi, ...,qt\ai, ...,at) 

If at = 7r(aigi, at_igt_i) Vt and q = qi,q2,..., then we also write p(g|vr) for the 
left hand side in (E]). 

Suppose that we have an enumerated set of policies {vTi} to choose from. Given 
a definition of reward R{q) for a sequence of percepts q = qi,q2,... that can for 
example be defined as in reinforcement learning by splitting qt into observation Ot 
and reward rt and using a discounted reward sum X]t7*^t |SB98t IHutOSj . then we 
can define 

/?(7r):=E,i?(g):=^/?(g)p(g|7r) (7) 
g 

and make the choice 

TT* := argmaxi?(7r). (8) 

If we have a class of environments {Tj} (say the computable environments) and if p 
is defined by saying that we assign probability pj to Tj being the true environment, 
then we let Rij = R{q) if q is the sequence of perceptions resulting from using policy 
TTj in environment Tj. Then R{ni) = "^jPjRij and we choose the policy with index 



argmax 



J2p,R^,r (9) 



As outlined in |Hut05] . one can choose a Solomonoff distribution also over active 
environments. The resulting agent is referred to as AIXI. 



3 Choosing a Program 

In this section we describe the setup of the second principle mentioned in the intro- 
duction, namely rationality. The section is much briefer than what is suitable for 
the topic and we refer the reader to our companion paper |SH11] for a more compre- 
hensive treatment. Rationality is meant in the sense of internal consistency |Sug91 , 
which is how it has been used in |NM44] and |Sav54j . We set up simple axioms 
for a rational decision maker, which implies that the decisions can be explained (or 
defined) from probabilistic beliefs. The approach to probability by |Ram31t ldeF37] 



6 



is interpreting probabilities as fair betting odds. There is an intuitive similarity 
between our setup to the idea of explaining/ deriving probabilities as a bookmaker's 
betting odds as done in |deF37j and |Ram31j . 

Before we consider the question regarding which program we want to choose we 
will first consider the question if we are prepared to accept guessing Tj for a given 
R = {Rij} (i.e. accepting this bet). We suppose that the alternative is to abstain 
(reject) and receive zero reward. We introduce rationality axioms and prove that we 
must have probabilistic beliefs over the possible sequences. Note that for any given 
i, we have a sequence Rij in cq (the space of real valued sequences that converge to 
0). We will set up some common sense rationality axioms for the way we make our 
decisions. We will demand that a decision can be taken for any reward structure 
r {Rij with fixed i) from cq. If r is acceptable and A > then we want Ar to be 
acceptable since this is simply a multiple of the same. We also want the sum of two 
acceptable reward structures to be acceptable. If we cannot lose (receive negative 
reward) we are prepared to accept while if we are guaranteed to gain we are not 
prepared to reject it. We cannot remove any axiom without losing the conclusion. 

Definition 2 (Rationality). Suppose that we have a function z : Cq ^ {—1,1,0} 
defining the decision reject/ accept/ either (—1/1/0) and Z = {r & Cq \ z(r) G {0, 1}}. 

1. z{r) G {0, 1} if and only if z{—r) G {—1, 0} 

2. r, s G A, 7 > then Ar + 7s G Z 

3- If rk > V/c then r E Z while if > VA; then z{r) = 1. 

The following theorem connects our Rationality axioms with the Hahn-Banach 
theorem [Kre89j and concludes that rational decisions can be described with a pos- 
itive continuous linear functional on the space of reward structures. The Banach 
space dual of Cq is ii which gives us a probabilistic representation of underlying 
behefs. 

Theorem 3 (Linear separation). Given the assumptions in DefinitionlE there exists 
a positive continuous linear functional / : cq — )■ M defined by f{r) = J2j''^jPj where 
r = {rj}, pj > and J2jPj < ^> ■^'^^c/i that 

{x I /(r) > 0} C Z C {r I /(r) > 0}. (10) 

Proof. The second property tells us that Z and —Z are convex cones. The first 
and third property tells us that Z ^ M™. Suppose that there is a point r that 
lies in both the interior of Z and of —Z. Then the same is true for — r according 
to the first property and for the origin. That a ball around the origin lies in Z 
means that Z = R"* which is not true. Thus the interiors of Z and —Z are disjoint 
open convex sets and can, therefore, be separated by a hyperplane (according to 
the Hahn-Banach theorem) which goes through the origin (since according to the 
first and third property z{0) = 0). The first property tell us that Z U —Z = M™. 
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Given a separating hyperplane (between the interiors of Z and — 'Z\ Z must contain 
everything on one side. This means that Z is a half space whose boundary is a 
hyperplane that goes through the origin and the closure Z of Z is a closed half 
space and can be written as {r | fir) > 0} for some / in the Banach space dual 
Cq = ii of Cq. The third property tells us that / is positive. □ 

Theorem [3] also leads us to how to choose between different options. If we 
consider picking Tj over we will do (accept) that if Ri^. — Rk,. is accepted. This is 
the case if '^PjRij > J2Pj^k,j- The conclusion is that if we are presented with Rij 
and a class {Tj} and we assign probability pj to Tj being the truth, then we choose 

argmax Ri jpj . (11) 

i 

J 

Remark 4. // we replace the space cq by ioo as the space of reward structures in 
Theorem\^ the conclusion (see ISHllf ) is instead that f is in the Banach space dual 
^oo of ioo which contains ii (the countably additive measures) but also functions that 
cannot be written on the form f{r) = '^jrjPj. is sometimes called the ba space 
lDie84^ and it consists of all finitely additive measures. 



4 Representation 

In this section we will discuss how indifference together with a representation leads 
to a choice of prior weights. The representation will be given in terms of codes that 
are strings of letters from a finite alphabet and it tells us which distinctions we will 
apply our indifference principle to. Choosing the first bit can be viewed as choosing 
between two propositions, e.g. x is a vegetable or x is a fruit. More choices follow 
until a full specification (a code word for the given reference machine) is reached. 
The section describes the usual material on the Solomonoff distribution (see |LV08] ) 
in a way that highlights in what sense it is based on indifference. The indifference 
principle itself is an external behavioural principle. 

Definition 5 (Indifference). Given a reward structure for two alternative outcomes 
of an event where we receive Ri or R2 depending on the outcome, then if we are 
indifferent we accept this bet if Ri + R2 > 0. For an agent with probabilistic beliefs 
that maximize expected utility this means that equal probability is assigned to both 
possibilities. 

We will discuss examples that are based on considering the set {apple, orange, 
carrot} and the representation that is defined by first separating fruit from vegetables 
and then the fruits into apples and oranges. 

Example 6. We are about to open a box within which there is either a fruit or a 
vegetable. We have no other information (except possibly, a list of what is a fruit 
and what is a vegetable). 



8 



Example 7. We are about to open a box within which there is either an apple, or 
an orange or a carrot. We have no other information. 

Consider a representation where we use binary codes. If the first digit is a 
it means a vegetable, i.e. a carrot. No more digits are needed to describe the 
object. If the first digit is a 1 it means a fruit. If the next digit after the 1 is 
a its an apple and if it is a 1 its an orange. In the absence of any other back- 
ground knowledge/information and given that we are going to be indifferent for this 
choice, we assign uniform probabilities for each choice of letter in the string. For 
our examples this results in probabilities Pr(fruit) = Pr(vegetable) = 1/2. After 
concluding this we consider the next distinction and conclude that Pr(apple|fruit) = 
Pr (orange I fruit) = 1/2. This means that the decision maker has the prior beliefs 
Pr(carrot) = 1/2, Pr(apple) = Pr(orange) = 1/4. 

An alternative representation would be to have a trinary alphabet and give each 
object its own letter. The result of this is Pr (apple) = Pr (orange) = Pr (carrot) = 
1/3, Pr(fruit) = 2/3 and Pr(vegetable) = 1/3. 

The following formalizes the definition of a code and a prefix free code. Since 
we are assuming that the possible outcomes are never special cases of each other 
we need our code to be prefix free. Furthermore, Kraft's inequality says that 
'^^izc 2~'-^"'^^'^^'^^ < 1 if the set of codes C is prefix free. 

Definition 8 (Codes). A code for a set A is a set of strings C of letters from a 
finite alphabet B and a surjective map from C to A. We say that a code is prefix-free 
if no code string is a proper prefix of another. 

Definition 9 (Computable Representation). We say that a code is a computable 
representation if the map from code-strings to outcomes is a Turing machine. 

In the definition below we provide the formula for how a binary representation 
of the letters in an alphabet leads to a choice of a distribution. It is easily extended 
to non-binary representations. 

Definition 10 (Distribution from representation). Given a binary prefix-free code 
for A ( our possible outcomes ), the expression 

Wa= Yl 2'^^''3th{c) ^ ^ ^ ^ 
c code for a 

defines a measure over A. 

Though the formula in Definition [10] uniquely determines the weights given a 
representation, there is still a very wide choice of representations. We are going to 
deal with this concern to restrict ourself to the class of universal representations with 
the property that given any other computable representation, the universal weights 
are at least a constant times the weights resulting from the other representation. See 
|Sol60t iLVOSt IHutOSj for a more extensive treatment. These universal representations 
are defined by having a universal Turing machine (in our case the given reference 
machine) as the map from codes to outcomes. 
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Definition 11 (Universal Representation). If a universal Turing machine is used 
for defining the map from codes to outcomes we say that we have a universal ( com- 
putable) representation. 

Tlie weiglits tliat result from using a universal representation satisfy the 
property that if Wa are the resulting weights from another computable representa- 
tion, then there is C > such that w^^ > Cwa Va G A. This follows directly from 
the universality of the Turing machine, which means that any other Turing machine 
can be simulated on the universal one by adding an extra prefix (interpreter) to each 
code. That is, feeding ic to the universal machine gives the same output as feeding 
c to the other machine. The constant C is 2~'*^"5f«{«)_ 

Theorem 12. Applying DefinitionUU together with a representation of finite strings 
based on a universal Turing machine gives us the Solomonoff semi-measure. 

Proof. Given a universal Turing machine U we create a set of codes C from all 
programs that generate an output of at least h bits. We let the code c G C represent 
the finite string x G X* with l{x) = h if U{c) = x*. We show below that this 
representation together with Definition [10] leads to the Solomonoff distribution for 
the next h bits. By considering all /i > 1 we recover the Solomonoff semi-measure 
over X*. 

Formally, given x G X* we let (in Definition [TOj) a = x and we define p{x) := Wa 
and conclude that 

U(p)=x* 

which is the Solomonoff semi-measure. □ 

Remark 13 (Unique Representation). Given a universal Turing machine, we could 
choose to let only the shortest program that generates a certain output represent that 
output, and not all the programs that generate this output. The length of the shortest 
program p that gives output x is called the Kolmogorov complexity K{x) of x. Using 
only the shortest program leads to the slightly different weights 

compared to Definition \TIK Both weighting schemes are, however, equivalent within 
a multiplicative constant ILVOSf . 



5 Sequence Prediction 

We will in this section summarize how Solomonoff Induction as described in |Hut07] 
follows from what we have presented in Section [3] and Section H] together with our 
fourth principle of time consistency. Consider a binary sequence that is revealed to 
us one bit at a time. We are trying to predict the future of the sequence, either 
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one bit, several bits or all of them. By combining the conclusions of Section [3] and 
m we can define a sequence prediction algorithm which turns out to be Solomonoff 
Induction. The results from Section [3] tells us that if we are going to be able to 
make rational guesses about which computable sequence we will see, we need to 
have probabilistic beliefs. 

If we are interested in predicting a finite number of bits we need to design the 
reward structure in Section |3] to reflect what we are interested in. If we want to 
predict the next bit we can let Rij = 1 if Tj and Tj have the same next bit and 
Rij = —1 otherwise. This leads to (a weighted majority decision to) predicting 1 
if Ej|T,. produces iPj > Ej|T, produces oPj ^nd if the rcvcrsc inequality is true. The 
reasoning and result generalizes naturally to predicting finitely many bits and we 
can interpret this as minimizing the expected number of errors. 

5.1 Updating 

Suppose that we have observed a number of bits of the sequences. This result in 
contradictions with many of the sequences and they can be ruled out. We next 
formally state the fourth principle from the introduction. 

Definition 14 (Time-consistency). Suppose that we are observing a sequence 
Xi,X2,... one bit at a time (xt at time t). Suppose that we (at time t) want to 
predict the next h bits of a sequence and our decisions (for any t and h) are defined 
by a function z\ from the set of all reward structures (^^^"^ where m = 2^ in the 
binary case) to the set of strings of length h. 

Suppose that if zj^j^^lr) = y and y starts with Xt+i- If it then follows that 
z'l^^^r') = y where r' is the restriction of r to the strings that start with Xt+i (and we 
identify such a string of length h+1 with the string of length h that follow the first bit) 
and if this implication is true for any t,r, h we say that we have time- consistency. 

Theorem 15. Suppose that we have a semi-measure p : X* — )■ [0, 1] and that we at 
time ( qiven any loss L ) predict the next h bits according to 

argmin ^ L{yi,y2)p{y2). (12) 

If we furthermore assume time- consistency and observe x G X* , then we predict 

argmin L{yi,y2)p{xy2\x). (13) 

Proof. Suppose that there are yi, y2 and x such that ^^^^^[^s 7^ M^mA^ This obviously 
contradicts time-consistency. In other words, time-consistency implies that relative 
beliefs in strings that are not yet contradicted remains the same. Therefore, the 
decision function after seeing x can be described by a semi-measure where the in- 
consistent alternatives have been ruled out and the others just renormalized. This 
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is what (fT3|) is describing. The only remaining point to make is that we have ex- 
pressed (fT2l) and (fT3|) in terms of loss instead of reward though it is simply a matter 
of changing the sign and max for min. □ 



6 The AIXI Agent 

In this section we discuss extensions to the case where an agent is choosing a sequence 
of actions that affect the environment it is in. We will simply replace the principle 
that says that we predict computable sequences by one that says that we predict 
computable environments. The environments are such that the agent takes an action 
that is fed to the environment and the environment responds with an output that we 
call a perception. There is a finite alphabet for the action and one for the perception. 

Our aim is to choose a policy for the agent. This is a function from the history 
of the actions and perceptions that has appeared so far, to the action which the 
agent chooses next. Suppose that a class {vTj} of policies, a class of (all) computable 
environments {Tj} and a reward structure Rij which is the total reward for using 
policy TTj in environment Tj. To assume the property that lim j Ri j = Vz, would 
mean that we assume that the stakes are lower in the environments of high index. 
This somewhat restrictive and there are alternatives to making this assumption 
(that the reward structure is in cq) and we investigate the result of assuming that 
we instead have the larger space ioo (see Remark H]) in a separate article |SHllj on 
rationality axioms and conclude that the difference is that we get finite additivity 
instead of countable additivity for the probability measure but that we can get back 
to countable additivity by adding an extra monotonicity assumption. The arguments 
in Section [3] imply (given cq reward structure) that we must assign probabilities {pj} 
for the environment being Tj and choose a policy with index 

argmax >^ RijPj ■ (14) 



This is what the AIXI agent described in |Hut05j is doing. The AIXI choice of 
weights Pj correspond to the choice 2~^^'^^ (as in Remark [T3|) . but for the class of 
lower semi- computable u discussed below in Section [71 

The same updating technique as in Section El where we eliminate the environ- 
ments which are inconsistent with what has occurred, is being used. This is deduced 
from the same time-consistency principle as for sequence prediction, just stating that 
the relative belief in environments that are still consistent will remain unchanged. 
This leads to the AIXI agent from |Hut05j . 
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7 Remarks on Stochastic Lower 
Semi-computable Environments 



Having the belief that the environment is computable does seem like a restrictive as- 
sumption though we will here argue that it is in an interesting way equivalent to hav- 
ing beliefs over all lower semi-computable stochastic environments. The Solomonoff 
prior is based on having belief 2"'^^^ in having input program p defining the envi- 
ronment. We can (proven up to a multiplicative factor in |LV08j and exact identity 
in |WSHllj ). however, rewrite this prior as a mixture ^^Wi/i^ over all lower semi- 
computable environments v where > Q for all v. Therefore, acting according to 
our Solomonoff mixture over computable enviroments is identical to acting according 
to beliefs over a much larger set of environments where we have randomness. 



8 Conclusions 

We defined four principles for universal sequence prediction and showed that 
Solomonoff induction and AIXI are determined from them. These principles are 
computability, rationality, indifference and time consistency. Computability tells 
us that Turing machines are the explanations we consider for what we are seeing. 
Rationality tells us that we have probabilistic beliefs over these. Time-consistency 
leads to the conclusion that we update these beliefs based on conditional probability 
and the principle of indifference tells us how to chose the original beliefs based on 
how compactly the various Turing machines can be implemented on the reference 
machine. 
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